-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working nodeFit feature #559
Conversation
Hi @RyanDevlin. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @damemi @ingvagabund Could you please review my PR? Specifically @damemi I made some changes to the TopologySpreadConstraint strategy to avoid checking for node affinity and taints twice (see line 290 of topologyspreadconstraint.go). I'm fairly certain my changes didn't effect the functionality of the algorithm, but I wanted to double check with you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
@RyanDevlin It seems that you forgot to run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
./hack/verify-spelling.sh
./hack/verify-toc.sh
./hack/verify-conversions.sh
Generated output differs:
diff -Naupr pkg/api/v1alpha1/zz_generated.conversion.go ./hack/../_tmp/kube-verify.vLZmmb/descheduler/pkg/api/v1alpha1/zz_generated.conversion.go
--- pkg/api/v1alpha1/zz_generated.conversion.go 2021-04-30 11:32:04.717979648 +0000
+++ ./hack/../_tmp/kube-verify.vLZmmb/descheduler/pkg/api/v1alpha1/zz_generated.conversion.go 2021-04-30 11:34:02.331140501 +0000
@@ -294,6 +294,7 @@ func autoConvert_v1alpha1_StrategyParame
out.ThresholdPriority = (*int32)(unsafe.Pointer(in.ThresholdPriority))
out.ThresholdPriorityClassName = in.ThresholdPriorityClassName
out.LabelSelector = (*v1.LabelSelector)(unsafe.Pointer(in.LabelSelector))
+ out.NodeFit = in.NodeFit
return nil
}
@@ -313,6 +314,7 @@ func autoConvert_api_StrategyParameters_
out.ThresholdPriority = (*int32)(unsafe.Pointer(in.ThresholdPriority))
out.ThresholdPriorityClassName = in.ThresholdPriorityClassName
out.LabelSelector = (*v1.LabelSelector)(unsafe.Pointer(in.LabelSelector))
+ out.NodeFit = in.NodeFit
return nil
}
Generated conversions verify failed. Please run ./hack/update-conversions.sh
make: *** [Makefile:124: verify-gen] Error 1
./hack/updat-conversions.sh
needs to be run
@RyanDevlin #551 is more about deciding what's the average number of duplicates per node rather than checking if a pod can be evicted. The issue is about avoiding unnecessary evictions in the first place. |
@damemi I can't figure this one out. I only see @ingvagabund In the description of #551 it says "Take at least node taints (e.g. taints repealing pods from master nodes) into account when selecting pods for eviction. If possible, take node selector into account as well". I'm a bit confused how that differs from what my feature does? Since |
Sorry, the script should be |
@damemi On my end, I think the problem is that the output of Running |
@RyanDevlin that's odd, I checked out your branch and had no issues:
With the only change being those 2 lines of |
go.sum
Outdated
@@ -106,6 +106,7 @@ github.com/evanphx/json-patch v4.9.0+incompatible/go.mod h1:50XU6AFN0ol/bzJsmQLi | |||
github.com/fatih/color v1.7.0/go.mod h1:Zm6kSWBoL9eyXnKyktHP6abPY2pDugNf5KwzbycvMj4= | |||
github.com/form3tech-oss/jwt-go v3.2.2+incompatible h1:TcekIExNqud5crz4xD2pavyTgWiPvpYe4Xau31I0PRk= | |||
github.com/form3tech-oss/jwt-go v3.2.2+incompatible/go.mod h1:pbq4aXjuKjdthFRnoDwaVPLA+WlJuPGy+QneDUgJi2k= | |||
github.com/fsnotify/fsnotify v1.4.7 h1:IXs+QLmnXW2CcXuY+8Mzv/fWEsPGWxqefPtCP5CnV9I= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
odd that there is a go.sum
change in this PR
you are also committing the kind
binary below this, it won't let me comment there but please remove that too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that was dumb, I removed to go.sum line and kind binary.
@damemi I made sure I was rebased on the upstream master, deleted rdevlin@localhost:~/go/src/sigs.k8s.io/descheduler$ tree sigs.k8s.io/descheduler/pkg/
sigs.k8s.io/descheduler/pkg/
├── api
│ ├── v1alpha1
│ │ ├── zz_generated.conversion.go
│ │ ├── zz_generated.deepcopy.go
│ │ └── zz_generated.defaults.go
│ └── zz_generated.deepcopy.go
└── apis
└── componentconfig
├── v1alpha1
│ ├── zz_generated.conversion.go
│ ├── zz_generated.deepcopy.go
│ └── zz_generated.defaults.go
└── zz_generated.deepcopy.go
5 directories, 8 files |
#551 is focused mainly for adjusting https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/descheduler/strategies/duplicates.go#L107. As described in the description:
Currently, all 6 nodes are taken into account. So the average number of pods per node is 2. The idea is to adjust |
@ingvagabund I see what you mean, thanks for that link! I'll remove #551 from the description of the PR. |
962d562
to
7cf9f12
Compare
@damemi I'm still unable to resolve my generated conversions issue, but I have some more information if it helps. You can see the latest build logs here: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_descheduler/559/pull-descheduler-unit-test-master-master/1389200208998436864. The log file shows all the places in my
You can see these errors yourself if you fetch my code and run So far I haven't been able to figure out the cause of these errors. If there is something I'm doing wrong, or something I should try, please let me know. |
@RyanDevlin in that Prow output I see: If so, something is totally broken. The reason you're getting undefined errors is because those generated files don't have any type definitions to go along with them. In other words, that script should just be updating the existing So, first off we need to delete the |
@RyanDevlin sorry for the late review. I finally found more time to check your changes more closely. |
@ingvagabund Thank you for taking the time to review my PR! I've resolved all of your comments except a few. These are left open with questions above. Please let me know if there is anything I missed. |
27becb6
to
0e73e08
Compare
/retest |
@ingvagabund have you had time to review my comments above? Specifically the second comment about strategies that filtered for node taints and affinity? Just to reiterate, my line of thinking was that the goal of this feature was to get all strategies in line with a method for considering taints, affinity, etc. before descheduling. Some strategies already performed a subset of these checks before considering pods for descheduling. In order to make this process more uniform, I removed some of the filtering for taints, etc. from My assumption was that, if a user desired more advanced filtering, they would turn on the |
/retest |
I like the point you are making. I am in favor of simplifying the code. On the other hand, we can't change the default behavior just because of that. I'd rather wait for a release or two and make it enabled by default. Then eventually remove the gate completely. |
@ingvagabund That's a fair point. In that case, I can revert any filtering changes I made to the strategies for now. Also I'll wait to hear from @damemi @lixiang233 @seanmalloy for extra thoughts on this before pushing anything. For this type of follow up task, should I file a new issue so we can keep track of the future work? |
I'd like to hear from other as well what they think about it. Filling a tracking issue sounds good if agreed. |
Yeah, let's not change the default behavior right now. Tracking it over the next 2 releases to be enabled by default makes sense, and falls in line with upstream's featuregating approach |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest looks good (I checked the unit tests pretty fast this time, they will need some refactoring soon though, it's getting harder to review changes in those)
@ingvagabund @damemi Thanks for the input! I've restored the original functionality of the strategies. I also agree that the tests are becoming exceedingly complex, especially the e2e tests. Lastly, I've opened issue #574 to track the removal of the optimization performed by this feature from the individual strategies. Hopefully that covers everything for this PR, but if not, please comment and I will make changes! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
I'll leave any final points to @ingvagabund or @seanmalloy
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: damemi, RyanDevlin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm @RyanDevlin thank you for the patience. |
@damemi @ingvagabund Thank you both for the review! |
Working nodeFit feature
Implements #529.
This feature adds a
nodeFit
boolean parameter to all descheduler strategies. WhennodeFit
is set totrue
, the descheduler strategy will take "node fit" into account when evicting pods. This increases the probability that pods can be re-scheduled to a new node successfully upon eviction.For the current implementation, "node fit" is defined by the following criteria:
nodeSelector
on the podTolerations
on the pod and anyTaints
on the other nodesnodeAffinity
on the podunschedulable